System and Method for Photorealistic Imaging Workload Distribution

ABSTRACT

A graphics client receives a frame, the frame comprising scene model data. A server load balancing factor is set based on the scene model data. A prospective rendering factor is set based on the scene model data. The frame is partitioned into a plurality of server bands based on the server load balancing factor and the prospective rendering factor. The server bands are distributed to a plurality of compute servers. Processed server bands are received from the compute servers. A processed frame is assembled based on the received processed server bands. The processed frame is transmitted for display to a user as an image.

TECHNICAL FIELD

The present invention relates generally to the field of computernetworking and parallel processing and, more particularly, to a systemand method for improved photorealistic imaging workload distribution.

BACKGROUND OF THE INVENTION

Modern electronic computing systems, such as microprocessor systems, areoften configured to divide a computationally-intensive task intodiscrete sub-tasks. For heterogeneous systems, some systems employcache-aware task decomposition to improve performance on distributedapplications. As technology advances, the gap between fast local cachesand large slower memory widens, and caching becomes even more important.Generally, typical modern systems attempt to distribute work acrossmultiple processing elements (PEs) so as to improve cache hit rates andreduce data stall times.

For example, ray tracing, a photorealistic imaging technique, is acomputationally expensive algorithm that usually does not have fixeddata access patterns. However, ray tracing tasks can nevertheless have avery high spatial and temporal locality. As such, a cache aware taskdistribution for ray tracing applications can lead to high performancegains.

But typical ray tracing approaches cannot be configured to take fulladvantage of cache aware task distribution. For example, current raytracers decompose the rendering problem by breaking up an image intotiles. Typical ray tracers either expressly distribute these tiles amongcomputational units or greedily reserve the tiles for access by the PEsthrough work stealing.

Both of these approaches suffer from significant disadvantages. Intypical express distribution systems, the additional workload requiredto manage the distribution of tiles inhibits performance. In some cases,this additional workload can mitigate any gains achieved through manageddistribution.

In typical work-stealing systems, each PE grabs new tiles after it hasprocessed its prior allotment. But since the PEs grab the tiles from ageneral pool, the tiles are less likely to have a high spatial locality.Thus, in a work-stealing system, the PEs regularly flush their cacheswith new scene data and are therefore cold for the next frame,completely failing to take any advantage of the task's spatial locality.

BRIEF SUMMARY

The following summary is provided to facilitate an understanding of someof the innovative features unique to the embodiments disclosed and isnot intended to be a full description. A full appreciation of thevarious aspects of the embodiments can be gained by taking intoconsideration the entire specification, claims, drawings, and abstractas a whole.

A graphics client receives a frame, the frame comprising scene modeldata. A server load balancing factor is set based on the scene modeldata. A prospective rendering factor is set based on the scene modeldata. The frame is partitioned into a plurality of server bands based onthe server load balancing factor and the prospective rendering factor.The server bands are distributed to a plurality of compute servers.Processed server bands are received from the compute servers. Aprocessed frame is assembled based on the received processed serverbands. The processed frame is transmitted for display to a user as animage.

In an alternate embodiment, a system comprises a graphics client. Thegraphics client is configured to receive a frame, the frame comprisingscene model data; set a server load balancing factor based on the scenemodel data; set a prospective rendering factor based on the scene modeldata; partition the frame into a plurality of server bands based on theserver load balancing factor and the prospective rendering factor;distribute the plurality of server bands to a plurality of computeservers; receive processed server bands from the plurality of computeservers; assemble a processed frame based on the received processedserver bands; and transmit the processed frame for display to a user asan image.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying figures, in which like reference numerals refer toidentical or functionally-similar elements throughout the separate viewsand which are incorporated in and form a part of the specification,further illustrate the embodiments and, together with the detaileddescription, serve to explain the embodiments disclosed herein.

FIG. 1 illustrates a block diagram showing an improved photorealisticimaging system in accordance with a preferred embodiment;

FIG. 2 illustrates a block diagram showing an improved graphics clientin accordance with a preferred embodiment;

FIG. 3 illustrates a block diagram showing an improved compute server inaccordance with a preferred embodiment;

FIG. 4 illustrates a high-level flow diagram depicting logicaloperational steps of an improved photorealistic imaging workloaddistribution method, which can be implemented in accordance with apreferred embodiment;

FIG. 5 illustrates a high-level flow diagram depicting logicaloperational steps of an improved photorealistic imaging workloaddistribution method, which can be implemented in accordance with apreferred embodiment; and

FIG. 6 illustrates a block diagram showing an exemplary computer systemthat can be configured to incorporate one or more preferred embodiments.

DETAILED DESCRIPTION

The particular values and configurations discussed in these non-limitingexamples can be varied and are cited merely to illustrate at least oneembodiment and are not intended to limit the scope of the invention.

In the following discussion, numerous specific details are set forth toprovide a thorough understanding of the present invention. Those skilledin the art will appreciate that the present invention may be practicedwithout such specific details. In other instances, well-known elementshave been illustrated in schematic or block diagram form in order not toobscure the present invention in unnecessary detail. Additionally, forthe most part, details concerning network communications,electro-magnetic signaling techniques, user interface or input/outputtechniques, and the like, have been omitted inasmuch as such details arenot considered necessary to obtain a complete understanding of thepresent invention, and are considered to be within the understanding ofpersons of ordinary skill in the relevant art.

As will be appreciated by one skilled in the art, the present inventionmay be embodied as a system, method or computer program product.Accordingly, the present invention may take the form of an entirelyhardware embodiment, an entirely software embodiment (includingfirmware, resident software, micro-code, etc.) or an embodimentcombining software and hardware aspects that may all generally bereferred to herein as a “circuit,” “module” or “system.” Furthermore,the present invention may take the form of a computer program productembodied in any tangible medium of expression having computer usableprogram code embodied in the medium.

Any combination of one or more computer usable or computer readablemedium(s) may be utilized. The computer-usable or computer-readablemedium may be, for example but not limited to, an electronic, magnetic,optical, electromagnetic, infrared, or semiconductor system, apparatus,device, or propagation medium. More specific examples (a non-exhaustivelist) of the computer-readable medium would include the following: anelectrical connection having one or more wires, a portable computerdiskette, a hard disk, a random access memory (RAM), a read-only memory(ROM), an erasable programmable read-only memory (EPROM or Flashmemory), an optical fiber, a portable compact disc read-only memory(CDROM), an optical storage device, a transmission media such as thosesupporting the Internet or an intranet, or a magnetic storage device.Note that the computer-usable or computer-readable medium could even bepaper or another suitable medium upon which the program is printed, asthe program can be electronically captured, via, for instance, opticalscanning of the paper or other medium, then compiled, interpreted, orotherwise processed in a suitable manner, if necessary, and then storedin a computer memory. In the context of this document, a computer-usableor computer-readable medium may be any medium that can contain, store,communicate, propagate, or transport the program for use by or inconnection with the instruction execution system, apparatus, or device.The computer-usable medium may include a propagated data signal with thecomputer-usable program code embodied therewith, either in baseband oras part of a carrier wave. The computer usable program code may betransmitted using any appropriate medium, including but not limited towireless, wireline, optical fiber cable, RF, etc.

Computer program code for carrying out operations of the presentinvention may be written in any combination of one or more programminglanguages, including an object oriented programming language such asJava, Smalltalk, C++ or the like and conventional procedural programminglanguages, such as the “C” programming language or similar programminglanguages. The program code may execute entirely on the user's computer,partly on the user's computer, as a stand-alone software package, partlyon the user's computer and partly on a remote computer or entirely onthe remote computer or server. In the latter scenario, the remotecomputer may be connected to the user's computer through any type ofnetwork, including a local area network (LAN) or a wide area network(WAN), or the connection may be made to an external computer (forexample, through the Internet using an Internet Service Provider).

The present invention is described below with reference to flowchartillustrations and/or block diagrams of methods, apparatus (systems) andcomputer program products according to embodiments of the invention. Itwill be understood that each block of the flowchart illustrations and/orblock diagrams, and combinations of blocks in the flowchartillustrations and/or block diagrams, can be implemented by computerprogram instructions. These computer program instructions may beprovided to a processor of a general purpose computer, special purposecomputer, or other programmable data processing apparatus to produce amachine, such that the instructions, which execute via the processor ofthe computer or other programmable data processing apparatus, createmeans for implementing the functions/acts specified in the flowchartand/or block diagram block or blocks.

These computer program instructions may also be stored in acomputer-readable medium that can direct a computer or otherprogrammable data processing apparatus to function in a particularmanner, such that the instructions stored in the computer-readablemedium produce an article of manufacture including instruction meanswhich implement the function/act specified in the flowchart and/or blockdiagram block or blocks.

The computer program instructions may also be loaded onto a computer orother programmable data processing apparatus to cause a series ofoperational steps to be performed on the computer or other programmableapparatus to produce a computer implemented process such that theinstructions which execute on the computer or other programmableapparatus provide processes for implementing the functions/actsspecified in the flowchart and/or block diagram block or blocks.

A data processing system suitable for storing and/or executing programcode will include at least one processor coupled directly or indirectlyto memory elements through a system bus. The memory elements can includelocal memory employed during actual execution of the program code, bulkstorage, and cache memories which provide temporary storage of at leastsome program code in order to reduce the number of times code must beretrieved from bulk storage during execution.

Input/output or I/O devices (including but not limited to keyboards,displays, pointing devices, etc.) can be coupled to the system eitherdirectly or through intervening I/O controllers. Network adapters mayalso be coupled to the system to enable the data processing system tobecome coupled to other data processing systems or remote printers orstorage devices through intervening private or public networks. Modems,cable modems and Ethernet cards are just a few of the currentlyavailable types of network adapters.

Referring now to the drawings, FIG. 1 is a high-level block diagramillustrating certain components of a system 100 for improvedphotorealistic imaging workload distribution, in accordance with apreferred embodiment of the present invention. System 100 comprises agraphics client 110.

Graphics client 110 is a graphics client module or device, as describedin more detail in conjunction with FIG. 2, below. Graphics client 110couples to display 120. Display 120 is an otherwise conventionaldisplay, configured to display digitized graphical images to a user.

Graphics client 110 also couples to a user interface 130. User interface130 is an otherwise conventional user interface, configured to sendinformation to, and receive information from, a user 132. In oneembodiment, graphics client 110 receives user input from user interface130. In one embodiment, user input comprises a plurality of imageframes, each frame comprising scene model data, the scene model datadescribing objects arranged in an image. In one embodiment, user inputalso comprises camera movement commands describing perspective (or“eye”) movement from one image frame to another.

In the illustrated embodiment, graphics client 110 also couples tonetwork 140. Network 140 is an otherwise conventional network. In oneembodiment, network 140 is a gigabit Ethernet network. In an alternateembodiment, network 140 is an Infiniband network.

Network 140 couples to a plurality of compute servers 150. Each computeserver 150 is a compute server as described in more detail inconjunction with FIG. 3, below. In the illustrated embodiment, graphicsclient 110 couples to the compute servers 150 through network 140.

In an alternate embodiment, graphics client 110 couples to one or morecomputer servers 150 through a direct link 152. In one embodiment, link152 is a direct physical link. In an alternate embodiment, link 152 is avirtual link, such as a virtual private network (VPN) link, for example.

Generally, in an exemplary operation, described in more detail below,system 100 operates as follows. User 132, through user interface 130,directs graphics client 110 to display a series of images on display120. Graphics client 110 receives the series of images as a series ofdigitized image “frames,” for example, by retrieving the series offrames from a storage on graphics client 110 or from user interface 130.Generally, each frame comprises scene model data describing elementsarranged in a scene.

For each frame, graphics client 110 partitions the frame into aplurality of server bands, each server band associated with a particularcompute server 150, based on a server load balancing factor and aprospective rendering factor. Graphics client 110 distributes the serverbands to the compute servers 150. Each compute server 150 (comprising aplurality of processing elements (PEs)) divides the received serverbands (received as “raw display bands”) into PE blocks, each PE blockassociated with a particular PE, based on a PE load balancing factor. Insome embodiments, the compute servers 150 divide the server bands intoPE blocks based on the PE load balancing factor and prospectiverendering information received from the graphics client 110. The computeservers 150 distribute the PE blocks to their PEs.

The PEs process the PE blocks, rendering the raw frame data andperforming the computationally intensive work of turning the raw framedata into a form suitable for the target display 120. In photorealisticimaging processing, rendering can include ray tracing, ambientocclusion, and other techniques. The PEs return the processed PE blocksto their parent compute server 150, which assembles the processed PEblocks into a processed display band.

In some embodiments, the compute servers 150 compress the processeddisplay bands for transmission to graphics client 110. In someembodiments, one or more compute servers 150 transmit the processeddisplay bands without additional compression. Each compute server 150determines the time each of its PEs took to render its PE block and thetotal rendering time for the entire raw display band.

The compute servers 150 adjust their PE load balancing factor based onthe individual rendering times for each PE. In one embodiment, eachcompute server 150 also reports its total rendering time to graphicsclient 110.

Graphics client 110 receives the processed display bands and assemblesthe bands into a processed frame. Graphics client 110 transmits theprocessed frame to display 120 for display to the user. In oneembodiment, graphics client 110 modifies the load balancing factor basedon reported rendering times received from the compute servers 150.

Thus, as described generally above and in more detail below, graphicsclient 110 distributes unprocessed server bands to compute servers 150based in part on the relative load between the servers and in part onprospective rendering information received from the user. The computeservers 150 divide the unprocessed server bands into PE blocks based onthe relative load between the PE blocks and the prospective renderinginformation. The PEs process the blocks, which the compute servers 150combine into processed bands and return to the graphics client 110.Graphics client 110 assembles the received processed bands into a formsuitable for display to a user. Both the compute servers 150 andgraphics client 110 use rendering times to adjust load balancing factorsdynamically.

As such, system 100 can dynamically distribute the workload among theelements performing computationally intensive tasks. As the frame datachanges, certain portions of the frame become more computationallyintensive than others, and the system can respond by reapportioning thetasks so as to keep the response times roughly equivalent. As oneskilled in the art will understand, roughly equivalent response timesindicate a balanced load and help to reduce idle time for thePEs/servers.

FIG. 2 is a block diagram illustrating an exemplary graphics client 200in accordance with one embodiment of the present invention. Inparticular, client 200 includes control processing unit (PU) 202.Control PU 202 is an otherwise conventional processing unit, configuredas described herein. In one embodiment, client 200 is a PlayStation3™(PS3). In an alternate embodiment, client 200 is an x86 machine. In analternate embodiment, client 200 is a thin client.

Client 200 also includes load balancing module 204. Generally, controlPU 202 and load balancing module 204 partition a graphics image frameinto a plurality of bands based on a server load balancing factor and aprospective rendering factor. In particular, in one embodiment, loadbalancing module 204 is configured to set and modify a server loadbalancing factor based on server response times and user input. In oneembodiment, user input comprises manual server load balancing settings.

In one embodiment, load balancing module 204 divides the frame intobands comprising the frame data, and system 200 transmits the dividedframe data to the compute servers for rendering. In an alternateembodiment, client 200 transmits coordinate information demarcating theboundaries of each band in the frame. In one embodiment, the coordinateinformation comprises coordinates referring to a cached (and commonlyaccessible) frame.

Load balancing module 204 is also configured to set and modify aprospective rendering factor based on scene model data, user input, andserver response times. In one embodiment, user input comprises cameramotion information. In one embodiment, camera motion informationcomprises a perspective, or camera “eye”, and a movement vectorindicating the speed and direction of a change in perspective.

For example, in one embodiment, client 200 accepts user input includingcamera motion information and is therefore aware of the direction andspeed of the eye's motion. In an alternate embodiment, client 200accepts user input including tracking information for a human user's eyemovement, substituting the human user's eye movement for a camera eyemovement. As such, load balancing module 204 can adjust the server bandpartitioning in advance, based on the expected change in computationalload across the frame.

That is, one skilled in the art will understand that certain parts ofthe frame are more computationally intensive than other parts. Forexample, a frame segment consisting of only a solid, single-colorbackground is much less computationally intensive than a frame segmentcontaining a disco ball reflecting light from multiple sources. Thus,for example, load balancing module 204 could divide the frame into threebands, one band comprising one-half of the disco ball, and two bandseach comprising the entire background and one-quarter of the disco ball.

Further, when the camera eye changes, the scene elements in the frame(e.g., the disco ball) occupy more or less of the frame, in a differentlocation of the frame. In one embodiment, the camera eye movementinformation includes the direction and velocity of the camera or humaneye change, as a “tracking vector.” In an alternate embodiment, thecamera eye movement information includes a target scene object, uponwhich the camera eye is focused, and the target scene object's relativedistance from the current perspective point. That is, if the system isaware of a specific object that is the focus of the user's attention, a“target scene object,” the system can predict that the scene will shiftto move that specific object toward the center or near-center of theviewing window. If, for example, the target scene object is locatedupward and rightward of the current perspective, the camera eye, andtherefore the scene, will likely next shift upward and rightward, andthe load balancing module can optimize the server band partitioning forthat tracking vector.

As such, in one embodiment, load balancing module 204 uses the cameraeye movement information and the scene model data to adjust the serverband partitioning in advance, which tends to equalize the computationalload across the compute servers. In one embodiment, load balancingmodule 204 uses the tracking vector, target scene object, and relativedistance to determine the magnitude of the server band partitioningadjustments. In one embodiment, the magnitude of the server bandpartitioning adjustments is a measure of the “aggressiveness” of aserver band partitioning.

Generally, having partitioned the frame into server bands, client 200distributes the server bands to their assigned compute servers. Client200 receives processed display bands from the compute servers in return.In one embodiment, client 200 determines the response time for eachcompute server. In an alternate embodiment, client 200 receives reportedresponse times from each compute server.

Client 200 also includes cache 206. Cache 206 is an otherwiseconventional cache. Generally, client 200 stores processed andunprocessed frames, and other information, in cache 206.

Client 200 also includes decompressor 208. In one embodiment, client 200receives compressed processed server bands from the compute servers. Assuch, decompressor 208 is configured to decompress compressed processedserver bands.

Client 200 also includes display interface 210, user interface 212, andnetwork interface 214. Display interface 210 is an otherwiseconventional display interface, configured to interface with a display,such as display 120 of FIG. 1, for example. User interface 212 is anotherwise conventional user interface, configured, for example, as userinterface 130 of FIG. 1. Network interface 214 is an otherwiseconventional network interface, configured to interface with a network,such as network 140 of FIG. 1, for example.

As described above, client 200 is a graphics client, such as graphicsclient 110 of FIG. 1, for example. Accordingly, client 200 transmits rawserver bands to computer servers for rendering and receives processeddisplay bands for display. FIG. 3 illustrates an exemplary computeserver in accordance with one embodiment of the present invention.

In particular, FIG. 3 is a block diagram illustrating an exemplarycompute server 300 in accordance with one embodiment of the presentinvention. In particular, server 300 includes control processing unit(PU) 302. As illustrated, control PU 302 is an otherwise conventionalprocessing unit, configured to operate as described below.

Server 300 also includes a plurality of processing elements (PEs) 310.Generally, each PE 310 is an otherwise conventional PE, configured witha local store 312. As described in more detail below, each PE 310receives a PE block for rendering, renders the PE block, and returns arendered PE block to the control PU 302.

Server 300 also includes load balancing module 304. Generally, controlPU 302 and load balancing module 304 partition a received raw displayband into a plurality of PE blocks based on a PE load balancing factor.In particular, in one embodiment, load balancing module 304 isconfigured to set and modify a PE load balancing factor based on PEresponse times. In an alternate embodiment, the PE load balancing factorincludes a prospective rending factor, and load balancing module 304 isconfigured to modify the PE load balancing factor based on PE responsetimes and user input.

In one embodiment, load balancing module 304 divides the received rawdisplay band into PE blocks comprising the frame data and control PU 302transmits the divided frame data to the PEs for rendering. In analternate embodiment, control PU 302 transmits coordinate informationdemarcating the boundaries of each PE block. In one embodiment, thecoordinate information comprises coordinates referring to a cached (andcommonly accessible) frame.

Generally, having partitioned the raw display bands into PE blocks,server 300 distributes the PE blocks their assigned PEs. The PEs 310render their received PE blocks and return rendered PE blocks to controlPU 302. In one embodiment, each PE 310 stores a rendered PE block incache 306 and indicates to control PU 302 that the PE has completedrendering its PE block.

As such, server 300 also includes cache 306. Cache 306 is an otherwiseconventional cache. Generally, server 300 stores processed andunprocessed bands, PE blocks, and other information, in cache 306.

Server 300 also includes compressor 308. In one embodiment, the graphicsclient receives compressed processed server bands from the computeservers. As such, compressor 308 is configured to compress processeddisplay bands for transmission to the graphics client.

Server 300 also includes network interface 314. Network interface 314 isan otherwise conventional network interface, configured to interfacewith a network, such as network 140 of FIG. 1, for example.

Generally, server 300 receives raw display bands from a graphics client.Control PU 302 and load balancing module 304 divide the received displayband into PE blocks based on a PE load balancing factor. The PEs 310render their assigned blocks and control PU 302 assembles the renderedPE blocks into a processed display band. Compressor 308 compresses theprocessed display band and server 300 transmits the processed displayband to the graphics client.

In one embodiment, control PU 302 adjusts the PE load balancing factorbased on the rendering times for each PE 310. In one embodiment, controlPU 302 also determines a total rendering time for the entire displayband and reports the total rendering time to the graphics client. Thus,generally, server 300 can modify the PE load balancing factor to adaptto changing loads on the PEs.

Thus, server 300 can balance the rendering load between the PEs, whichin turn helps improve (minimize) response time. The operation of thegraphics client and the compute server are described in additionaldetail below. More particularly, the operation of an exemplary graphicsclient is described with respect to FIG. 4, and the operation of anexemplary compute server is described with respect to FIG. 5.

FIG. 4 illustrates one embodiment of a method for photorealistic imagingworkload distribution. Specifically, FIG. 4 illustrates a high-levelflow chart 400 that depicts logical operational steps performed by, forexample, system 200 of FIG. 2, which may be implemented in accordancewith a preferred embodiment. Generally, control PU 202 performs thesteps of the method, unless indicated otherwise.

As indicated at block 405, the process begins, wherein system 200receives a digital graphic image frame comprising scene model data fordisplay. For example, system 200 can receive a frame from a user orother input. Next, as illustrated at block 410, system 200 receives userinput. As described above, in one embodiment, user input includes cameramovement information.

Next, as illustrated at block 415, system 200 sets or modifies a serverload balancing factor based on the received frame. Next, as illustratedat block 420, system 200 sets or modifies a prospective rendering factorbased on received user input and scene model data. Next, as illustratedat block 425, system 200 partitions the frame into server bands based onthe server load balancing factor and the prospective rendering factor.

Based on the user input and the prospective rendering factor, system 200is aware of the direction and speed of the camera eye's motion. As such,system 200 can pre-adjust the server workload without having to relyexclusively on reactive adjustments. For example, if the user “looks” upor down (moving the camera eye vertically), system 200 can decrease thesize of the regions of the compute server on the leading edge to accountfor the new model geometry that is about to be introduced into thescene.

Moreover, system 200 can adjust how aggressively to rebalance theworkload based on the speed of the eye motion. If the camera eye ismoving more quickly, system 200 can adjust the workload moreaggressively. If the camera eye is moving more slowly, system 200 canadjust the workload less aggressively.

Additionally, system 200 can tailor workload rebalancing according tothe type of eye movement demonstrated by the user input. That is,certain types of eye movement respond best to different adjustmentpatterns. For example, zooming in or moving along the eye vector leadsto less of an imbalance across compute servers. As such, system 200 canadjust the workload less aggressively in response to a rapid zoomfunction, for example, than in response to a rapid pan function.

In one embodiment, system 200 partitions the frame into horizontalserver bands. In an alternate embodiment, system 200 partitions theframe into vertical server bands. In an alternate embodiment, system 200partitions the frame into horizontal or vertical server bands, dependingon which alignment yields the more effective (load balancing)partitioning.

Next, as illustrated at block 430, system 200 distributes the serverbands to compute servers. Next, as illustrated at block 435, system 200receives compressed processed display bands from the compute servers.Next, as illustrated at block 440, system 200 decompresses the receivedcompressed processed display bands.

Next, as illustrated at block 445, system 200 assembles a processedframe based on the processed display bands. Next, as illustrated atblock 450, system 200 stores the processed frame. Next, as illustratedat block 455, system 200 displays an image based on the processed frame.As described above, in one embodiment, system 200 transmits theprocessed frame to a display module for display.

Next, as illustrated at block 460, system 200 receives reportedrendering times from the compute servers. Next, as illustrated at block465, system 200 modifies the server load balancing based on the reportedrendering times. The process returns to block 405, wherein the graphicsclient receives a frame for processing.

FIG. 5 illustrates one embodiment of a method for photorealistic imagingworkload distribution. Specifically, FIG. 5 illustrates a high-levelflow chart 500 that depicts logical operational steps performed by, forexample, system 300 of FIG. 3, which may be implemented in accordancewith a preferred embodiment. Generally, compute PU 302 performs thesteps of the method, unless indicated otherwise.

As illustrated at block 505, the process begins, wherein a computeserver receives a raw display band from a graphics client. For example,system 300 of FIG. 3 receives a raw display band from a graphics client200 of FIG. 2. Next, as illustrated at block 510, system 300 partitionsthe raw display band into PE blocks based on a PE load balancing factor.

In one embodiment, the raw display band includes camera movementinformation and system 300 partitions the raw display band into PEblocks based on a PE load balancing factor and the camera movementinformation. In one embodiment, system 300 partitions the raw displayband in a similar fashion as does system 200 as described with respectto block 425, above. Accordingly, system 300 can dynamically partitionthe raw display band to account for prospective changes in thecomposition of the frame image, helping to maintain load balance betweenthe PEs.

Next, as illustrated at block 515, system 300 distributes the PE blocksto the processing elements. For example, control PU 302 distributes thePE blocks to one or more PEs 310. Next, as illustrated at block 520,each PE renders its received PE block. For example, the PEs 310 rendertheir received PE blocks.

Next, as illustrated at block 525, control PU 302 receives the renderedPE blocks from the PEs 310. As described above, in one embodiment,control PU 302 receives a notification from the PEs 310 that therendered blocks are available in cache 306. Next, as illustrated atblock 530, system 300 combines the rendered PE blocks into a processeddisplay band.

Next, as illustrated at block 535, system 300 compresses the processeddisplay band for transmission to the graphics client. For example,compressor 308 compresses the processed display band for transmission tothe graphics client. Next, as illustrated at block 540, system 300transmits the compressed display band to the graphics client.

Next, as illustrated at block 545, system 300 determines a render timefor each PE. For example, control PU 302 determines a render time foreach PE 310. Next, as illustrated at block 545, system 300 reports therendering time to the graphics client. In one embodiment, system 300calculates the total rendering time for the processed display band,based on the slowest PE, and reports the total rendering time to thegraphics client. In an alternate embodiment, system 300 reports therendering time for each PE to the graphics client.

Next, as illustrated at block 555, system 300 adjusts the PE loadbalancing factor based on the rendering time for each PE. As describedabove, system 300 can set the PE load balancing factor to divide theworkload among the PEs such that each PE takes approximately the sameamount of time to complete its rendering task.

Accordingly, the disclosed embodiments provide numerous advantages overother methods and systems. For example, the disclosed embodimentsimprove balanced workload distribution over current approaches,especially work-stealing systems. Because the disclosed embodimentsbetter distribute the computational workload, work-stealing isunnecessary, and the computational units can retain relevant cache datawithout also incurring the penalties inherent in re-tasking a processingelement under common work-stealing schema.

More specifically, the disclosed embodiments provide the balance ofphotorealistic imaging workload distribution, especially in ray tracingapplications. By actively managing the computationally intensive regionsof a frame, and stalling the computational units waiting for the nextframe, the rendering system spends less time stalled for data.

Further, the disclosed embodiments offer methods that maintain focus ofa computational unit on a particular region, even as that region isexpanded or reduced to maintain relative workload. As such, anyparticular computational unit is more likely to retain useful frame datain its cache, which improves cache hit rates. Moreover, the improvedcache hit rates overcome the slightly increased intra-frame stalls,improving the overall rendering time.

Additionally, the disclosed embodiments provide a system and method thatdynamically adjusts the workload based on prospective rendering tasking.As such, the disclosed embodiments can reduce the performance impact ofa rapidly moving camera eye by anticipating changes in the computationalintensity of regions in the scene. Other technical advantages will beapparent to one of ordinary skill in the relevant arts.

As described above, one or more embodiments described herein may bepracticed or otherwise embodied in a computer system. Generally, theterm “computer,” as used herein, refers to any automated computingmachinery. The term “computer” therefore includes not only generalpurpose computers such as laptops, personal computers, minicomputers,and mainframes, but also devices such as personal digital assistants(PDAs), network enabled handheld devices, internet or network enabledmobile telephones, and other suitable devices. FIG. 6 is a block diagramproviding details illustrating an exemplary computer system employableto practice one or more of the embodiments described herein.

Specifically, FIG. 6 illustrates a computer system 600. Computer system600 includes computer 602. Computer 602 is an otherwise conventionalcomputer and includes at least one processor 610. Processor 610 is anotherwise conventional computer processor and can comprise asingle-core, dual-core, central processing unit (PU), synergistic PU,attached PU, or other suitable processors.

Processor 610 couples to system bus 612. Bus 612 is an otherwiseconventional system bus. As illustrated, the various components ofcomputer 602 couple to bus 612. For example, computer 602 also includesmemory 620, which couples to processor 610 through bus 612. Memory 620is an otherwise conventional computer main memory, and can comprise, forexample, random access memory (RAM). Generally, memory 620 storesapplications 622, an operating system 624, and access functions 626.

Generally, applications 622 are otherwise conventional software programapplications, and can comprise any number of typical programs, as wellas computer programs incorporating one or more embodiments of thepresent invention. Operating system 624 is an otherwise conventionaloperating system, and can include, for example, Unix, AIX, Linux,Microsoft Windows™, MacOS™, and other suitable operating systems. Accessfunctions 626 are otherwise conventional access functions, includingnetworking functions, and can be include in operating system 624.

Computer 602 also includes storage 630. Generally, storage 630 is anotherwise conventional device and/or devices for storing data. Asillustrated, storage 630 can comprise a hard disk 632, flash or othervolatile memory 634, and/or optical storage devices 636. One skilled inthe art will understand that other storage media can also be employed.

An I/O interface 640 also couples to bus 612. I/O interface 640 is anotherwise conventional interface. As illustrated, I/O interface 640couples to devices external to computer 602. In particular, I/Ointerface 640 couples to user input device 642 and display device 644.Input device 642 is an otherwise conventional input device and caninclude, for example, mice, keyboards, numeric keypads, touch sensitivescreens, microphones, webcams, and other suitable input devices. Displaydevice 644 is an otherwise conventional display device and can include,for example, monitors, LCD displays, GUI screens, text screens, touchsensitive screens, Braille displays, and other suitable display devices.

A network adapter 650 also couples to bus 612. Network adapter 650 is anotherwise conventional network adapter, and can comprise, for example, awireless, Ethernet, LAN, WAN, or other suitable adapter. As illustrated,network adapter 650 can couple computer 602 to other computers anddevices 652. Other computers and devices 652 are otherwise conventionalcomputers and devices typically employed in a networking environment.One skilled in the art will understand that there are many othernetworking configurations suitable for computer 602 and computer system600.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof code, which comprises one or more executable instructions forimplementing the specified logical function(s). It should also be notedthat, in some alternative implementations, the functions noted in theblock may occur out of the order noted in the figures. For example, twoblocks shown in succession may, in fact, be executed substantiallyconcurrently, or the blocks may sometimes be executed in the reverseorder, depending upon the functionality involved. It will also be notedthat each block of the block diagrams and/or flowchart illustration, andcombinations of blocks in the block diagrams and/or flowchartillustration, can be implemented by special purpose hardware-basedsystems that perform the specified functions or acts, or combinations ofspecial purpose hardware and computer instructions.

One skilled in the art will appreciate that variations of theabove-disclosed and other features and functions, or alternativesthereof, may be desirably combined into many other different systems orapplications. Additionally, various presently unforeseen orunanticipated alternatives, modifications, variations or improvementstherein may be subsequently made by those skilled in the art, which arealso intended to be encompassed by the following claims.

1. A method, comprising: receiving, by a graphics client, a frame, theframe comprising scene model data; setting a server load balancingfactor based on the scene model data; setting a prospective renderingfactor based on the scene model data; partitioning the frame into aplurality of server bands based on the server load balancing factor andthe prospective rendering factor; distributing the plurality of serverbands to a plurality of compute servers; receiving processed serverbands from the plurality of compute servers; assembling a processedframe based on the received processed server bands; and transmitting theprocessed frame for display to a user as an image.
 2. The method ofclaim 1, further comprising: receiving user input; and wherein settingthe prospective rendering factor further comprises setting theprospective rendering factor based on the scene model data and receiveduser input.
 3. The method of claim 1, wherein partitioning the framefurther comprises selecting between horizontal server bands and verticalserver bands.
 4. The method of claim 1, further comprising: receivingreported rendering times from at least one of the plurality of servers;and wherein setting the server load balancing factor further comprisessetting the server load balancing factor based on the scene model dataand the reported rendering times.
 5. The method of claim 1, whereinassembling a processed frame band further comprises decompressing thereceived processed server bands.
 6. A computer program product forprocessing a digitized graphic frame, the computer program productstored on a computer usable medium having computer usable program codeembodied therewith, the computer useable program code comprising:computer usable program code configured to receive a frame, the framecomprising scene model data; computer usable program code configured toset a server load balancing factor based on the scene model data;computer usable program code configured to set a prospective renderingfactor based on the scene model data; computer usable program codeconfigured to partition the frame into a plurality of server bands basedon the server load balancing factor and the prospective renderingfactor; computer usable program code configured to distribute theplurality of server bands to a plurality of compute servers; computerusable program code configured to receive processed server bands fromthe plurality of compute servers; computer usable program codeconfigured to assemble a processed frame based on the received processedserver bands; and computer usable program code configured to transmitthe processed frame for display to a user as an image.
 7. The computerprogram product of claim 6, further comprising: computer usable programcode configured to receive user input; and wherein setting theprospective rendering factor further comprises setting the prospectiverendering factor based on the scene model data and received user input.8. The computer program product of claim 6, wherein partitioning theframe further comprises selecting between horizontal server bands andvertical server bands.
 9. The computer program product of claim 6,further comprising: computer usable program code configured to receivingreported rendering times from at least one of the plurality of servers;and wherein setting the server load balancing factor further comprisessetting the server load balancing factor based on the scene model dataand the reported rendering times.
 10. The computer program product ofclaim 1, wherein assembling a processed frame band further comprisesdecompressing the received processed server bands.
 11. A method,comprising: receiving, by a compute server, a raw display band, the rawdisplay band comprising scene model data; the compute server comprisinga plurality of processing elements (PEs); partitioning the raw displayband into a plurality of PE blocks based on a PE load balancing factor;distributing the plurality of PE blocks to the plurality of PEs;rendering, by each PE, the PE blocks, to generate rendered PE blocks;combining, by the compute server, the rendered PE blocks, to generate aprocessed display band; determining, by the compute server, a renderingtime for each PE; modifying the PE load balancing factor based on thedetermined rendering times; and transmitting the processed display bandto a graphics client.
 12. The method of claim 11, wherein transmittingcomprises compressing the processed display band.
 13. The method ofclaim 11, further comprising reporting a rendering time to the graphicsclient based on the determined rendering times.
 14. The method of claim11, further comprising: wherein the raw display band further comprisesprospective rendering input; and wherein partitioning the raw displayband comprises partitioning based on the PE load balancing factor andthe prospective rendering input.
 15. The method of claim 11, whereinmodifying the PE load balancing factor further comprises modifying thePE load balancing factor based on the determined rendering times andreceived prospective rendering input.
 16. A computer program product forprocessing a digitized graphic frame, the computer program productstored on a computer usable medium having computer usable program codeembodied therewith, the computer useable program code comprising:computer usable program code configured to receive a raw display band,the raw display band comprising scene model data; computer usableprogram code configured to partition the raw display band into aplurality of PE blocks based on a PE load balancing factor; computerusable program code configured to distribute the plurality of PE blocksto a plurality of PEs; computer usable program code configured torender, by each PE, the PE blocks, to generate rendered PE blocks;computer usable program code configured to combine the rendered PEblocks, to generate a processed display band; computer usable programcode configured to determine a rendering time for each PE; computerusable program code configured to modify the PE load balancing factorbased on the determined rendering times; and computer usable programcode configured to transmit the processed display band to a graphicsclient.
 17. The computer program product of claim 16, whereintransmitting comprises compressing the processed display band.
 18. Thecomputer program product of claim 16, further comprising computer usableprogram code configured to report a rendering time to the graphicsclient based on the determined rendering times.
 19. The computer programproduct of claim 16, further comprising: wherein the raw display bandfurther comprises prospective rendering input; and wherein partitioningthe raw display band comprises partitioning based on the PE loadbalancing factor and the prospective rendering input.
 20. The computerprogram product of claim 16, wherein modifying the PE load balancingfactor further comprises modifying the PE load balancing factor based onthe determined rendering times and received prospective rendering input.21. A system comprising a graphics client, the graphics clientconfigured to: receive a frame, the frame comprising scene model data;set a server load balancing factor based on the scene model data; set aprospective rendering factor based on the scene model data; partitionthe frame into a plurality of server bands based on the server loadbalancing factor and the prospective rendering factor; distribute theplurality of server bands to a plurality of compute servers; receiveprocessed server bands from the plurality of compute servers; assemble aprocessed frame based on the received processed server bands; andtransmit the processed frame for display to a user as an image.
 22. Thesystem of claim 21, further comprising: wherein the graphics client isfurther configured to receive user input; and wherein setting theprospective rendering factor further comprises setting the prospectiverendering factor based on the scene model data and received user input.23. The system of claim 21, further comprising: wherein the graphicsclient is further configured to receive reported rendering times from atleast one of the plurality of servers; and wherein setting the serverload balancing factor further comprises setting the server loadbalancing factor based on the scene model data and the reportedrendering times.
 24. The system of claim 21, further comprising: aplurality of compute servers, each compute server coupled to thegraphics client and comprising a plurality of processing elements (PEs),and each compute server configured to: receive a raw display band fromthe graphics client, the raw display band comprising scene model data;partition the raw display band into a plurality of PE blocks based on aPE load balancing factor; and distribute the plurality of PE blocks tothe plurality of PEs; wherein each PE is configured to render the PEblocks, to generate rendered PE blocks; and wherein each compute serveris further configured to: combine the rendered PE blocks rendered bythat compute server's PEs, to generate a processed display band;determine a rendering time for each of that compute server's PEs; modifythe PE load balancing factor based on the determined rendering times;and transmit the processed display band to the graphics client.
 25. Thesystem of claim 24, further comprising: wherein the raw display bandfurther comprises prospective rendering input; and wherein partitioningthe raw display band comprises partitioning based on the PE loadbalancing factor and the prospective rendering input.